ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group93c.txt / 000026_icon-group-sender _Thu Jul 22 12:40:36 1993.msg < prev next >

Wrap

Internet Message Format | 1994-02-02 | 4KB

Received: by cheltenham.cs.arizona.edu; Thu, 22 Jul 1993 12:27:11 MST Message-Id: <9307221702.AA20596@uu3.psi.com> Date: Thu, 22 Jul 93 12:40:36 EDT From: Jerry Leichter <leichter@lrw.com> To: ICON-GROUP@cs.arizona.edu Subject: RE: More on Icon performance X-Vms-Mail-To: UUCP%"walter!flaubert!norman@uunet.uu.net" Status: R Errors-To: icon-group-errors@cs.arizona.edu I want to create dags, so that I need to check to see if the node I'm about to create already exists. The best way I know of is to create a string that is a function of the node's elements, then look for that string in a table. Does anyone else know a better way? Here's what I'm doing: procedure newitem(nt, dotpos, cat) static cache initial { cache := table() } s := nt || " " || image(cat) || " " || dotpos /cache[s] := item(nt, dotpos, cat) return cache[s] end Can anyone suggest something better? I'm especially interested in something less wasteful of string space. What you are using is an inefficient variation of a classic, efficient tech- nique used in compilers that goes under the name of value numbering. It arises in common subexpression removal. Suppose you had a tree representing an expression, and wished to find all common subexpressions within it. A common subexpression would be a subtree whose leaves were all leaves of the original tree. Assign to each node in the tree a value as follows: If the node is a leaf, assign it a unique value based on what the leaf contains. (In the compiler context, a leaf is either a variable or a constant. The value represents the particular value or constant.) If the node is of the form [v1,...,vn], where the vi's are the value numbers of then node's n children, assign a value f(v1,...,vn), which is unique among all the value numbers chosen for the leaves, and among all values for f with different sets of arguments. Then two nodes represent common subexpressions if and only if they have the same value number. What you are doing is using "value numbers" that are not numbers, but strings; and you are using image() for f(), and at the leaves. While this works, as you note it's expensive. If you use numerical values, it's much easier to build small data strutures in which you can look things up quickly. (f() is almost always computed by (a) looking up [v1,...,vn] in an appropriate table - usually a hash table, but some kind of trie might make sense; (b) if the node isn't already in the table, incrementing a counter and assigning its value as the value number. Given the nature of the leaves in a compiler, it's usually easy to come up with simple value numbers for them directly.) You'll need some way to distinguish between leaves, which you could still look up using image(), and interior nodes; and you'd need some representation of [v1,...,vn]. It's unclear from your code, but if indeed you are only concerned with binary nodes, and your data structures aren't too large, then you might be able to get away with 16-bitf value numbers, and represent [v1,v2] as 2^16*v1+v2, which can be computed and looked up in a table very quickly. More generally, a list would probably make more sense. If certain nodes in the tree have appropriate mathematical properties, it's possible to adjust the value numbering algorithm to make use of them. For example, if we consider "+" to be commutative, so that "a+b" and "b+a" are to be considered the same subexpression, then we can choose f() so that f(+,v1,v2) = f(+,v2,v1). (Generally you'd do this by exchanging f's second and third arguments before computing it if, say, the second is larger than the third.) Without knowing anything about your application, it's impossible to say whether this trick might be useful. -- Jerry